Preparation of nucleic acid samples

ABSTRACT

The presently claimed invention provides methods, compositions, and apparatus for studying nucleic acids. Specifically, the present invention provides a novel enrichment and labeling strategy for ribonucleic acids. In one embodiment, the invention provides enriching for a population of interest in a complex population by diminishing the presence of a target sequence. In a further embodiment, the invention can be used to reproducibly label and detect extremely small amounts of nucleic acids.

RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No.60/162,739, filed Oct. 30, 1999, and U.S. Provisional Application No.60/191,345, filed Mar. 22, 2000, both of which are fully incorporatedherein by reference for all purposes.

BACKGROUND OF THE INVENTION

Novel methods for enriching and labeling nucleic acids are needed. Forexample, gene expression analysis techniques often employ isolation andlabeling of ribonucleic acid (RNA). Because of the interest inidentifying protein-encoding genes and in examining gene expressionlevels, it is often desirable to purify or enrich the messenger RNA(mRNA). The poly-adenine 3′-terminus (poly-A tail) of mRNA fromeukaryotic cells can be used as a handle to bind to poly(dT)oligonucleotides, and this method is widely used to identify, purify andor label eukaryotic mRNA. However, because prokaryotic mRNA generallylacks poly-A tails, there is a need for alternative methods forpurifying and labeling mRNA samples which do not rely on the existenceof a poly-A tail.

SUMMARY OF THE INVENTION

The presently claimed invention provides methods of preparing a nucleicacid sample for analysis.

In a first embodiment, the presently claimed invention provides a methodof preparing a nucleic acid sample for analysis comprising enriching fora population of interest within a mixed population of nucleic acids bycontacting the nucleic acid sample with a bait molecule. The baitmolecule is capable of complexing specifically to unwanted targetsequences within the nucleic acid sample, but is incapable of complexingwith sequences from the population of interest. The bait molecule iscontacted with the target sequences forming bait:target complexes whichare then specifically removed from the nucleic acid sample. Theremaining enriched population of interest is then fragmented and asignal moiety is attached to the fragments.

In a second embodiment, the presently claimed invention provides amethod of enriching for a population of interest within a mixedpopulation of nucleic acids by contacting the nucleic acid with a baitmolecule. The bait molecule is capable of complexing specifically tounwanted target sequences within the nucleic acid sample, but isincapable of complexing with sequences from the population of interest.The bait molecule is contacted with the target sequences formingbait:target complexes which are then specifically removed from thenucleic acid sample. Thus enriching for the population of interest.

In a third embodiment, the presently claimed invention provides acompound having the formula:

-   -   n-S-acetyl-PEO-sig        where n is a polynucleotide, S is a thiol group, acetyl is an        acetyl functional group, PEO is polyethelene oxide, and sig is a        signal moiety.

In a fourth embodiment, the presently claimed invention provides amethod for labeling a polynucleotide comprising contacting thepolynucleotide with a PEO-iodoacetyl conjugated to a signal moiety underconditions such that the PEO-iodoacetyl will attach to said nucleotide.

In a fifth embodiment, the presently claimed invention provides a methodfor labeling a polynucleotide comprising: contacting the polynucleotidewith a reactive thiol group to form a thiolated polynucleotide andcontacting the thiolated polynucleotide with either a signal moietycapable of reacting with said thiolated polynucleotide under appropriateconditions such that said signal moiety is attached to saidpolynucleotide.

In a sixth embodiment, the presently claimed invention provides a methodfor labeling prokaryotic mRNA comprising: obtaining a population of RNAfrom a prokaryotic organism; enriching the population for mRNA byexposing the population to a plurality of DNA bait molecules which arecomplementary to at least a portion of the stable RNA in said populationunder such conditions as to allow for the formation of DNA:RNA hybrids;exposing the DNA:RNA hybrids to RNAse H to remove the RNA from saidDNA:RNA hybrids; exposing the remaining DNA to DNase I to remove theDNA, thus producing an enriched population of mRNA; fragmenting theenriched mRNA to form mRNA fragments; exposing the mRNA fragments to(—S-ATP and T4 kinase to produce reactive thiol groups at the 5′ ends ofthe mRNA fragments; and exposing the thiolated mRNA fragments toPEO-Iodoacetyl-Biotin such that a stable thio-ether bond is formedbetween said thiolated mRNA fragments and said PEO-Iodoactyl-Biotin.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a schematic illustration of one embodiment of thepresently claimed invention in which target sequences are depleted froma mixed population of nucleic acids.

FIG. 2 depicts a schematic illustration of one embodiment of thepresently claimed invention wherein target sequences are complexed to abait molecule and then specifically digested.

FIG. 3 depicts a schematic illustration of one embodiment of thepresently claimed invention wherein bait molecules are synthesized byreverse transcriptase using target molecules as templates.

FIG. 4 depicts a schematic illustration of one embodiment of thepresently claimed invention in which bait molecules are recycled toinitiate repeated rounds of target depletion.

FIG. 5 depicts a schematic illustration of one embodiment of thepresently claimed invention in which sequences from an enrichedpopulation of interest are labeled.

FIG. 6 is an image of unenriched RNA hybridized to a microarray.

FIG. 7 is an image of enriched RNA hybridized to a microarray.

FIG. 8 is a gel image showing the depletion of 23S and 16S RNA using themethods of the presently claimed invention.

FIG. 9 is a gel image showing the depletion of 23S and 16S RNA using themethods of the presently claimed invention including bait cycling.

FIG. 10 is an image of a Northern transfer showing the amount of mRNAtranscript present during each round of rRNA depletion during a baitcycling experiment.

FIG. 11 is a gel image of biotin labeled mRNA fragments.

FIG. 12 is a gel image of a gel shift assay.

FIG. 13 depicts hybridization patterns of E. coli RNA labeled with thethiol-kinase dependent (panel A) and thiol-kinase independent (panel B)methods.

FIG. 14 shows the average difference correlation comparing the resultsof two different thiol-kinase dependent experiments to each other.

FIG. 15 shows the average difference correlation comparing the resultsof two different thiol-kinase independent experiments to each other.

FIG. 16 shows the average difference correlation comparing thethiol-kinase dependent experiments with the thiol-kinase independentexperiments.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

1. Definitions

The phrase “massively parallel screening” refers to the simultaneousscreening of at least about 100, preferably about 1000, more preferablyabout 10,000 and most preferably about 1,000,000 different nucleic acidhybridizations.

The terms “nucleic acid” or “nucleic acid molecule” refer to adeoxyribonucleotide or ribonucleotide polymer in either single-ordouble-stranded form, and unless otherwise limited, would encompassanalogs and mimetics of natural nucleotides that can function in asimilar manner as naturally occurring nucleotides. Nucleic acids may bederived from a variety of sources including, but not limited to, naturalor naturally occurring nucleic acids or mimetics thereof, clones,synthesis in solution or solid phase synthesis.

An “oligonucleotide” or “polynucleotide” is a nucleic acid ranging fromat least 2, preferable at least 8, and more preferably at least 20nucleotides in length or a compound that specifically hybridizes to apolynucleotide. Polynucleotides of the present invention includesequences of deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) whichmay be isolated from natural sources, recombinantly produced orartificially synthesized and mimetics thereof. A further example of apolynucleotide of the present invention may be peptide nucleic acid(PNA). The invention also encompasses situations in which there is anontraditional base pairing such as Hoogsteen base pairing which hasbeen identified in certain tRNA molecules and postulated to exist in atriple helix. “Polynucleotide” and “oligonucleotide” are usedinterchangeably in this application.

“Subsequence” refers to a sequence of nucleic acids that comprise a partof a longer sequence of nucleic acids.

The phrase “hybridizing specifically to” refers to the binding,duplexing, or hybridizing of a molecule substantially to or only to aparticular nucleotide sequence or sequences under stringent conditionswhen that sequence is present in a complex mixture (e.g., totalcellular) DNA or RNA. Standard conditions are described in, for example,Sambrook, Fritsch, Maniatis “Molecular Cloning: A Laboratory Manual”(1989) Cold Spring Harbor Press.

The term “mRNA” or “mRNA transcripts,” as used herein, include, but notlimited to pre-mRNA transcript(s), transcript processing intermediates,mature mRNA(s) ready for translation and transcripts of the gene orgenes, or nucleic acids derived from the mRNA transcript(s). Transcriptprocessing may include splicing, editing and degradation. As usedherein, a nucleic acid derived from an mRNA transcript refers to anucleic acid for whose synthesis the mRNA transcript or a subsequencethereof has ultimately served as a template. Thus, a cDNA reversetranscribed from an mRNA, an RNA transcribed from that cDNA, a DNAamplified from the cDNA, an RNA transcribed from the amplified DNA,etc., are all derived from the mRNA transcript and detection of suchderived products is indicative of the presence and/or abundance of theoriginal transcript in a sample. Thus, mRNA derived samples include, butare not limited to, mRNA transcripts of the gene or genes, cDNA reversetranscribed from the mRNA, cRNA transcribed from the cDNA, DNA amplifiedfrom the genes, RNA transcribed from amplified DNA, and the like.

The term “signal moiety” refers in a general sense to a detectablemoiety, such as a radioactive isotope or group containing the same, andnon-isotopic moieties, such as enzymes, biotin, avidin, streptavidin,digoxygenin, luminescent agents, dyes, haptens and the like. Luminescentagents, depending upon the source exciting the energy, can be classifiedas radioluminescent, chemiluminescent, bioluminescent, andphotoluminescent (fluorescent).

The phrase “mixed population” or “complex population” refers to anysample containing both desired and undesired nucleic acids. As anon-limiting example, a complex population of nucleic acids may be totalgenomic DNA, total cellular RNA or a combination thereof. Moreover, acomplex population of nucleic acids may have been enriched for a givenpopulation but include other undesirable populations. For example, acomplex population of nucleic acids may be a sample which has beenenriched for desired messenger RNA (mRNA) sequences but still includessome undesired ribosomal RNA sequences (rRNA).

Throughout the disclosure various Patents, Patent Applications andpublications are referenced. Unless otherwise indicated, each isincorporated by reference in its entirety for all purposes.

2. General

In a first embodiment, the presently claimed invention provides a methodof preparing a nucleic acid sample for analysis. It is often desirableto isolate, enrich, or increase the relative percentage of a particularpopulation of sequences within a much larger population of sequences inorder to limit analysis to those sequences of interest and to reduceinterference and unnecessary work which may be caused by the presence ofundesirable sequences. The methods of the presently claimed inventionprovide a novel method wherein a complex sample is depleted of undesiredsequences and is thus enriched for a population of interest. Oneparticularly preferred enrichment is to increase the relative percentageof prokaryotic mRNA in a given sample for further analysis.

Briefly, the method enriches for a population of interest within a mixedpopulation of nucleic acid sequences by targeting undesired sequences(target sequences) and removing them from the mixed population. First, amixed population of nucleic acid sequences is exposed to a baitmolecule. The bait molecule is capable of complexing specifically to atarget sequence but not to the sequences in the population of interest.The bait molecule is allowed to form a complex with the target sequenceand this complex is then specifically recognized and removed. Theremoval process may be conducted in a single step, or may involveremoving first the target sequences and then the subsequent removal ofthe bait molecule. In one particular example the bait molecules areshort DNA sequences which are complementary to the target sequences.

FIG. 1 illustrates a general embodiment of the presently claimedinvention. A mixed population 100 comprising a population of interest102 and target sequences 101 is exposed to bait molecules 103. The baitmolecules complex with the target sequences to form bait:targetcomplexes 104. The bait:target complex is then removed from the mixedpopulation thereby enriching for the population of interest.

The mixed population of nucleic acids may be any nucleic acid samplecomprising both desired and undesired sequences. The population mayinclude different DNA or RNA molecules. In a preferred embodiment, themixed population is an RNA sample, in a further preferred embodiment thenucleic acid sample is RNA derived from a prokaryotic organism. Themixed population may be derived from a wide variety of sources includingfor example, tissue samples, blood, isolated cells or environmentalsamples such as water or soil. The mixed population may be derived fromany organism including both eukaryotes and prokaryotes such as human,rat, mouse, Escherichia coli (E. coli), Bacillus subtilis (B. subtilis),Pseudomonas aerugionosa, etc. Methods of deriving nucleic acid samplesfrom eukaryotic and prokaryotic organisms will be well known to those ofskill in the art. See for example, Chapter 4, “Current Protocols inMolecular Biology,” Ausubel et al., eds (1997 supplement) Johan Wilen &Sons, Inc. and Chapter 7, Sambrook, Fritsch, Maniatis “MolecularCloning: A Laboratory Manual” (1989) Cold Spring Harbor Press, etc.

The population of interest may be any subset of the mixed population.The population of interest may include RNA and/or DNA. The population ofinterest may, for example, be a particular type of RNA. In a preferredembodiment the population of interest is mRNA. The population ofinterest may comprise any sequence and the sequence need not be known.The population of interest may be chosen on any basis, including bysequence, function (i.e. messenger RNA (mRNA), ribosomal RNA (rRNA),transfer RNA (tRNA), etc.) or a combination thereof.

The target sequences may be any undesired sequences in the mixedpopulation. The target sequences may comprise any sequence so long asthey are distinguishable by sequence from the population of interest.Target sequences may be chosen on any basis, including by sequence,function (i.e. mRNA, rRNA, tRNA, etc.) or a combination thereof. In apreferred embodiment the target sequences are stable RNAs including rRNAand tRNA. In some embodiments, it may not be necessary to remove all theundesired sequences from the mixed population. In these embodiments itis acceptable to remove only enough of the undesired sequences such thatthe undesired sequences do not interfere with analysis of the populationof interest. For example, in a prokaryotic expression study utilizingarray hybridization techniques, it may be desirable to remove rRNAsequences which may interfere with hybridization of the mRNAs to thearray by creating a significant background signal. In this example, itmay be acceptable to remove only the 23S and 16S RNAs, as removing thesesequences reduces background signals to acceptable levels. See, e.g.example 1, below.

In a preferred embodiment any non-targeted undesirable sequencesrepresent only a small proportion of the mixed population. Thesenon-targeted undesirable sequences may include a variety of othernucleic acids such as DNAs, rRNAs, mRNAs or tRNAs. For the sake ofsimplicity, the presence of non-targeted RNAs will not be discussedthroughout the remainder of the application, however, the possibility oftheir presence is contemplated by the scope of the presently claimedinvention.

The bait molecules may be obtained and added in a variety of methods.The bait molecules should be able to recognize and complex specificallywith the target molecule, but should not complex with the sequences fromthe population of interest. Moreover, the bait:target complex shouldhave a particular property which makes is vulnerable to a selection andremoval mechanism.

In one embodiment, the bait:target complex is targeted by an enzyme orprocess which specifically removes any target sequences which arecomplexed to a bait molecule. FIG. 2 depicts a schematic illustration ofthis embodiment. A mixed population 100 comprises a population ofinterest 102 and target sequences 101. Bait molecules 103 are introducedto complex specifically with the target sequences forming bait:targetcomplex 104. An enzyme or process 105 is introduced to specificallyremove the target sequences from the bait:target complexes withoutinterfering with the sequences from the population of interest. Afterremoval of the target sequences, the mixed population is comprised ofthe population of interest and the bait molecules. If desired, the baitmolecules may then be removed. (Step not shown.)

As one example, the bait sequence may be DNA and the target sequence maybe RNA. In this example the bait:target complex would be a DNA:RNAhybrid. The DNA:RNA hybrid is then removed from the mixed population.For example, in some embodiments an enzyme which specifically targetsDNA:RNA hybrids will be used to remove the DNA:RNA hybrid. In apreferred embodiment, RNAse H is used to specifically hydrolyze RNAwhich is part of a DNA:RNA hybrid. The remaining DNA is then availableto hybridize with another RNA target sequence. If desired, the DNA maythen be removed by addition of enzymes which specifically target anddigest DNA. In a preferred embodiment DNAse I is used. Alternatively,physical or other methods of removal may likewise be employed such asstraptavidin to remove biotinylated DNA.

A particular example of the presently claimed invention provides amethod of isolating or enriching for mRNAs within a mixed population ofRNAs by specifically removing targeted rRNAs. A mixed population of RNAsincludes mRNAs, tRNAs and rRNAs. DNA bait molecules which arecomplementary to the rRNAs but not to the mRNAs are added to the mixedpopulation under conditions suitable to allow for the formation ofDNA:RNA hybrids. Then, RNAse H specifically targets and removes any RNAwhich is part of a DNA:RNA hybrid, yielding DNA bait molecules and anenriched population of mRNAs.

If a DNA bait sequence is used, the DNA may be generated exogenously,chemically obtained, or synthesized from another biological source.Exogenous DNA may be generated by chemical or non-biological synthesis.Alternatively, exogenous DNA may be obtained through biologicalsynthesis, for example, through the production by bacteria of doublestranded plasmid DNA or single stranded phage DNA containing the baitsequence. Chemical or non-biological methods of synthesizing DNA will beknown to those of skill in the art and are described in, for example,Innis et al. (eds.) (1990) PCR Protocols: A Guide to Methods andApplications, Academic Press; and Gait (1984) Oligonucleotide Synthesis:A Practical Approach, IRL Press, Oxford.

In a preferred embodiment, rather than adding exogenous DNA as a bait,DNA:RNA hybrids are synthesized “in vivo” using the targeted RNA as atemplate for reverse transcription. This embodiment is depicted in FIG.3. Primers 106 which are complementary to the targeted RNA 101 are addedto the mixed population 100. The primers are allowed to hybridize to thetargeted RNAs forming primer-bound targeted RNAs 107. The primers areextended by reverse transcriptase to form DNA:RNA hybrids 104 which maythen be removed using any known method including those methods describedbelow producing an enriched population of interest 102.

Alternatively, a non-nucleic acid bait molecule may be used. Forexample, an antibody which specifically recognizes and binds the targetsequences may be employed in some embodiments of the presently claimedinvention. For example, an antibody may be modified to recognize DNA:RNAhybrids or specific rRNA sequences.

The method of removal may exploit some inherent or modified element ofthe bait. For example, if the bait is distinguishable by size from thesequences in the population of interest, a method of size separation,such as centrifugation, size separation column, or gel electrophoresiscould be employed to remove the bait:target complexes.

Alternatively, the bait molecule can be modified with a selectableelement, the properties of which may then be exploited in order toremove the bait:target complex from the mixed population. Non-limitingexamples of selectable elements include: nucleic acid sequences,ligands, receptors, antibodies, hapten groups, antigens, biotin,streptavidin, enzymes and enzyme inhibitors. Once a bait moleculecontaining a selectable element is complexed to the target sequence, thebait:target complex is exposed to a reagent capable of binding saidselectable element and the reagent:bait:target complex is removed fromthe mixed population.

For example, an antibody may be designed which specifically recognizesand binds rRNA sequences. The antibody may be biotinylated before orafter exposure to the rRNA sequences. The biotinylated antibody:rRNAcomplex is then exposed to streptavadin-coated beads. The magnetic beadswith the antibody:rRNA complex attached may then be removed from themixed population.

In some embodiments, the bait molecules may be attached to a solidsubstrate such as beads, fibers, or an array. The bait molecules may beattached to the solid substrate using any known method includingchemical or physical attachment. For example, nucleic acid sequences maybe synthesized directly on the solid support (see, e.g., Merrifield,“Solid Phase Peptide Synthesis,” J. Am. Chem. Soc., (1963) 85:2149-2154,Fodor et al., “Light Directed Spatially Addressable Parallel ChemicalSynthesis” Science (1991) 251:767-773, PCT publication WO90/15070, andU.S. Pat. Nos. 5,800,992, 5,445,934, 5,837,832 and 5,744,305) orpre-synthesized and then attached to the solid support (see e.g. PCTpublication No. WO92/10092 and U.S. Pat. Nos. 5,677,195, 5,412,087,6,022,963 and 6,040,193.)

For those embodiments employing bait molecules attached to solidsupports, enzymatic removal of the bound target sequences may beemployed if there is a desire to recycle the bait molecules. The methodof removing the solution from the solid supports may include any manualor mechanical means including pipetting, or draining in a fluidicsstation, so long as the solution is obtained in a manner so as topreserve the integrity of the sequences of interest. Otherwise, asindicated above, one may simply remove the solid support containing thebound target sequences, thereby removing the target sequences (and thebait molecules) and enriching for the population of interest.

In practice, the method of removal will vary depending on the type ofsolid support used. For example, if the solid support is an array, theunbound sequences may simply be washed off the support and the solutioncollected. If the solid support is a bead, the beads may be removed fromsolution by centrifugation. If the solid support is a magnetic bead, thebeads may be removed from solution by exploiting the magnetic propertiesof the beads. Regardless of the method used, the solution containing theunbound sequences is isolated from the solid support-bound bait:targetcomplexes.

FIG. 4 depicts another embodiment of the presently claimed invention inwhich the same bait molecule is used for repeated rounds of targetdepletion. In FIG. 4, a mixed population of nucleic acids 100 includesthe population of interest 102 and targeted sequences 101. Baitmolecules 103 which are complementary to the targeted sequences but notto the sequences in the population of interest are added to the mixedpopulation under conditions suitable to allow formation of bait:targetcomplexes 104. Next, an enzyme or process 105 specifically targets andremoves the target sequence from the bait:target complexes leaving thepopulation of interest 102, DNA bait molecules 103 and any undigestedtarget sequences 101. The remaining DNA bait molecules are then free tohybridize with any undigested target sequences to form new bait:targetcomplexes, thereby repeating the first step. The cycle can then berepeated as desired.

A preferred mechanism for carrying out repeated recycling of DNA baitmolecules employs cycling of different conditions. As above, a mixedpopulation of nucleic acids includes a population of interest and targetsequences. First, bait molecules are added to the mixed population underconditions suitable to allow formation of bait:target complex. Thisfirst step is performed under a first condition, for example at atemperature X. Second, an enzyme or process which specifically targetsand removes target sequences which are part of a bait:target complex isadded, yielding bait molecules and the population of interest. Thissecond step is performed under a second set of conditions which aredifferent from the conditions required for the first step, i.e. if thefirst step is performed at temperature X, the second step is performedat temperature Y where Y≠X. Conditions are then returned to those in thefirst step (i.e. the temperature is returned to X) and the baitmolecules are allowed to complex with any target sequences that were notremoved in the previous step. The conditions and steps are cycled inthis manner until the desired amount of target sequence is removed. Inthis embodiment, the same bait molecules serve as bait for numerousrounds of target depletion. At the end of the cycling process, the baitmolecules may be removed by an enzyme or process which specificallytargets and removes the bait. Note, the initial bait molecules may beintroduced by reverse transcribing the target sequences as describedabove and depicted in FIG. 3.

In a particular example of the above embodiment, a mixed population ofRNAs includes mRNA, 23S rRNA and 16S rRNA. Cloned ribosomal DNA (rDNA)bait molecules which are complementary to the 23s and 16s rRNAs areadded to the mixed population under conditions suitable to allow for theformation of DNA:RNA hybrids. In a preferred embodiment, the rRNA andrDNA annealing reaction is performed at a temperature range of between37° C. and 95° C., more preferably between 50° C. and 80° C. and morepreferably at 70° C. Next, a thermostable RNAse H is added to digest thebound rRNA sequences. In a preferred embodiment this step is performedat a temperature range of between 37° C. and 70° C., more preferably ata temperature range of between 40° C. and 60° C. and more preferably at50° C. The digestion yields rDNAs, mRNAs and undigested rRNAs.Thereafter, the temperature is raised to a temperature suitable forreannealing, e.g. 70° C., and the annealing step is repeated.Thereafter, the temperature is changed to a temperature suitable fordigestion, e.g. 50° C. and the digestion step is repeated. In thismanner, the temperature can be cycled to allow for repeated targeting ofrRNA molecules by the same DNA bait molecule. It should be noted that itis not necessary to employ different temperatures or conditions toconduct bait cycling as the DNA bait will become available once the RNAtarget sequence is removed by RNAse H. However, temperature cycling maypromote higher specificity and is, therefor, a preferred embodiment forcertain applications requiring high specificity.

In a preferred embodiment, once both the targeted RNA and DNA baitmolecules have been digested, the RNA of interest is further purifiedusing methods known in the art, including, for example, commerciallyavailable purification kits such as the MasterPure complete DNA/RNApurification kit (Epicentre Technologies, WI) or the RNeasy Kit (Qiagen,Valencia, Calif.).

Once the population of interest is enriched, it is often desirable tolabel the sequences in preparation for a number of different analyses.In one embodiment of the presently claimed invention, the enrichedpopulation of interest is fragmented and labeled. In the methods of thepresently claimed invention the label is a signal moiety. In a preferredembodiment the label is a biotin and in an even further preferredembodiment the label is a PEO-Iodoacetyl biotin.

Generally under the methods of the presently claimed invention, thefragmented sequences of interest are chemically modified such that the5′ ends comprise a reactive group. The reactive group is then reactedwith the signal moiety to produce labeled fragments. In an alternatemethod, the 5′ end modification step is skipped and the fragments aredirectly labeled with the signal moiety.

FIG. 5 depicts a specific example of one embodiment of the presentlyclaimed invention in which enriched fragments are biotin labeled. Amixed population of nucleic acids 100 includes a population of interest102 and target sequences 101. Bait molecules 103 are added to the mixedpopulation under conditions suitable to formation of bait:targetcomplexes 104. The bait:target complexes are removed leaving an enrichedpopulation of interest. If desired, the sequences from the population ofinterest may be further purified by known purification means (notshown). The sequences from the population of interest are thenfragmented producing fragments 108. The fragments are then chemicallyaltered to add a reactive group 109 to the 5′ end of each fragmentproducing reactive fragments 110. Finally, a signal moiety 111 isreacted with the reactive groups to produce labeled fragments 112.

Any known method of fragmentation may be employed. Various methods offragmenting nucleic acids will be known to those of skill in the art.These methods may be, for example, either chemical or physical innature. Fragmentation may include partial degradation with a DNAse,RNAse, partial depurination with acid followed by heating, andrestriction enzymes or other enzymes which cleave nucleic acid at knownor unknown locations. Physical fragmentation methods may involvesubjecting the nucleic acid to a high shear rate. High shear rates maybe produced, for example, by moving nucleic acid through a chamber orchannel with pits or spikes, or forcing the nucleic sample through arestricted size flow passage, e.g. an aperture having a cross sectionaldimension in the micron or submicron scale. Particular care must betaken when fragmenting RNA as it is easily degraded. Those of skill inthe art will be familiar with methods of fragmenting RNA. In a preferredembodiment, the RNA is fragmented by heat and ion-mediated hydrolysis.

Reactive groups and methods of modifying nucleic acid sequences tocontain reactive groups will be well known to those of skill in the art.In a particularly preferred embodiment the nucleic acid fragments areenzymatically modified by T4 polynucleotide kinase and γ-S-ATP to add a5′ thiol group suitable for biotinylation to the 5′ end of the nucleicacid fragments thus producing thiolated nucleic acid fragments. See, forexample, “Current Protocols in Molecular Biology,” Ausubel et aleditors, section 3.10.2-3.10.5 (1987) for a discussion of T4Polynucleotide Kinases.

In one embodiment of the presently claimed invention, a detectablesignal moiety is then reacted with the modified or unmodified 5′ end ofthe fragments to produced labeled fragments. In a preferred embodiment,a biotin group such as PEO-Iodoacetyl Biotin, is conjugated to 5′-endsof the fragments which have been modified by T4 polynucleotide kinaseand γ-S-ATP. In a particularly preferred embodiment, the label issupplied to the nucleic acid by the addition of oxidebiotinyl-iodacetamidyl-3,6-dioxaoctanediamine (Iodoacetyl Biotin) andmore preferably by the addition of polyethylene oxidebiotinyl-iodacetamidyl-3,6-dioxaoctanediamine (PEO-Iodoacetyl Biotin).PEO-Iodoacetyl Biotin (Pierce Chemical Co. Product # 21334ZZ) is along-chain, water-soluble, sulfhydryl (—SH)-reactive biotinylationreagent. The PEO spacer arm imparts high water solubility. IodoacetylBiotin (Pierce Chemical Co. Product #21333ZZ) is generally dissolved inDMSO or DMF before use. The iodoacetyl functional group reactspredominantly with free —SH groups. The reaction occurs by nucleophilicsubstitution of iodine with a thiol group, resulting in a stablethio-ether bond. The use of PEO-Iodoacetyl Biotin as a biotinylationreagent for proteins and antibodies has been described previously. See,for example, Instructions for EZ-Link™ PEO-Iodoacetyl Biotin, PierceChemical Co. We have found that PEO-Iodoacetyl Biotin is also a suitablelabel for nucleic acids. The use of Iodoacetyl Biotin as a biotinylationreagent for antibodies is described in, for example, U.S. Pat. No.5,137,804. The use of Iodoacetyl Biotin as a label for the enzyme kinaseis described in, for example, Jeong et al. Kinase “Assay Based onThiophosphorylation and Biotinylation,” Biotechniques 27:1232-1238(December 1999). We have also found that PEO-Iodoacetyl Biotin can beconjugated to a nucleic acid fragment without 5′ modification.

Other detectable signal moieties suitable for use in the presentinvention include any composition detectable by spectroscopic,photochemical, biochemical, immunochemical, electrical, optical orchemical means. Useful labels in the present invention include biotinfor staining with labeled streptavidin conjugate, magnetic beads (e.g.,Dynabeads™), fluorescent dyes (e.g., fluorescein, Texas red, rhodamine,green fluorescent protein, and the like), radiolabels (e.g., ³H, ¹²⁵I,³⁵S, ¹⁴C, or ³²P), enzymes (e.g., horse radish peroxidase, alkalinephosphatase and others commonly used in an ELISA), and colorimetriclabels such as colloidal gold or colored glass or plastic (e.g.,polystyrene, polypropylene, latex, etc.) beads. Patents teaching the useof such labels include U.S. Pat. Nos. 3,817,837; 3,850,752; 3,939,350;3,996,345; 4,277,437; 4,275,149; and 4,366,241.

Means of detecting such labels are well known to those of skill in theart. Thus, for example, radiolabels may be detected using photographicfilm or scintillation counters, fluorescent markers may be detectedusing a photodetector to detect emitted light. Enzymatic labels aretypically detected by providing the enzyme with a substrate anddetecting the reaction product produced by the action of the enzyme onthe substrate, and calorimetric labels are detected by simplyvisualizing the colored label. Colloidal gold label can be detected bymeasuring scattered light.

After purification of the product, the efficiency of the labelingprocedure can be assessed using, for example, a gel-shift assay. In thisassay, the addition of biotin residues is monitored by comparingfragments which are pre-incubated with avidin prior to electrophoresiswith fragments where no avidin has been added. Biotin-containingresidues are retarded or shifted “upwards” on the gel during theelectrophoresis due to avidin binding. The nucleic acids are thendetected by staining. An absence of a shift pattern is an indication ofno or poor biotin labeling.

The above disclosed labeling method may be employed for any nucleic acidmolecule including both RNAs and DNAs. Furthermore, the labeling methodmay be performed without the enrichment protocol.

Methods of Use

Array-Based Assays

The nucleic acids isolated and or labeled by the methods described inthis disclosure may be analyzed by hybridization to nucleic acid arrays.Those of skill in the art will appreciate that an enormous number ofarray designs are suitable for the practice of this invention. Highdensity arrays may be used for a variety of applications, including, forexample, gene expression analysis, genotyping and variant detection.

Various techniques for large scale polymer synthesis and probe arraymanufacturing are known. Some examples include the U.S. Pat. Nos.5,143,854, 5,242,979, 5,252,743, 5,324,663, 5,384,261, 5,405,783,5,412,087, 5,424,186, 5,445,934, 5,451,683, 5,482,867, 5,489,678,5,491,074, 5,510,270, 5,527,681, 5,550,215, 5,571,639, 5,593,839,5,599,695, 5,624,711, 5,631,734, 5,677,195, 5,744,101, 5,744,305,5,753,788, 5,770,456, 5,831,070, 6,040,193 and 5,856,011, all of whichare incorporated by reference in their entirety for all purposes.

For gene expression analysis, the high density array will typicallyinclude a number of probes that specifically hybridize to the nucleicacid(s) whose expression is to be detected. Array based methods formonitoring gene expression are disclosed and discussed in detail in U.S.Pat. Nos. 5,800,992, 5,871,928, 5,925,525, 6,040,138 and PCT ApplicationWO92/10588 (published on Jun. 25, 1992), all incorporated herein byreference for all purposes. Generally these methods of monitoring geneexpression involve (1) providing a pool of target nucleic acidscomprising RNA transcript(s) of one or more target gene(s), or nucleicacids derived from the RNA transcript(s); (2) hybridizing the nucleicacid sample to a high density array of probes and (3) detecting thehybridized nucleic acids and calculating a relative expression(transcription, RNA processing or degradation) level.

For genotyping and variant detection, the high density array willtypically include a number of probes which are designed to interrogate aparticular position which is believed or known to be associated withsequence variation. Array based methods for variant detection aredisclosed and discussed in detail in U.S. Pat. Nos. 5,837,832,5,856,104, 5,856,092, 5,858,659, 6,027,880 and 5,925,525 each of whichis incorporated herein by reference for all purposes. Generally thesemethods of variant detection involve (1) providing a pool of targetnucleic acids comprising DNA from the region(s) to be interrogated (2)hybridizing the nucleic acid sample to a high density array of probesand (3) detecting the hybridized nucleic acids and determining thepresence or absence of a sequence variant.

Creation of an mRNA Library

The methods of the presently claimed invention can be used to create anmRNA library. The present techniques are particularly useful in creatingan mRNA library from prokaryotic cells since prokaryotic mRNA lacks thepolyA tail that is traditionally used to isolate mRNA populations fromcomplex nucleic acid samples. Briefly, a sample is obtained from anindividual. The sample is then enriched for mRNA using the techniquesdescribed by the presently claimed invention. Then, following standardprotocols known in the art, enriched mRNA can then be used as a templatefor cDNA synthesis. The cDNA second strand is then synthesized. Adaptorsare ligated to the double stranded cDNA and the double stranded cDNAsequences are cloned into appropriate vectors.

Those of skill in the art will be familiar with methods for creatingmRNA libraries. See, e.g. Maniatis et al., “Molecular Cloning: ALaboratory Manual, 2^(nd) Ed. Cold Spring Harbor Laboratory Press, ColdSpring Harbor N.Y. (1989) (“Maniatis et al.,”) especially Chapter 8which is incorporated by reference in its entirety for all purposes.

CDNA synthesis typically involves the addition of short oligonucleotideswhich act as primers for reverse transcriptase. These shortoligonucleotides may be of a specific known sequence, or may be ofrandom sequence. The length and sequence of the short oligonucleotideswill vary based upon the sequence to be reverse transcribed butpreferably the short oligonucleotides are between 5 and 10 bases inlength and most preferably are about 6 bases in length. Methods of cDNAsynthesis are described, for example, in Maniatis et al., see especiallysections 8.11-8.13.

For a description of second strand synthesis see, e.g. Maniatis et al.,section 8.13-8.17. Methods of ligating adaptors to the double strandedsequences and cloning those sequences into suitable vectors will beknown to those of skill in the art and are well described in Maniatis etal., Chapter 8, sections 8.23-8.45. Analysis of cDNA libraries isdescribed throughout Chapter 8 of Maniatis et al.

EXAMPLES

1. mRNA Enrichment by Removal of 16S and 23S rRNA Using In Vivo cDNASynthesis

The following procedure was performed in PCR tubes in a thermocycler. Aninitial mixture was prepared by mixing 25 :g of total E. coli RNA to13.75 :L of 5.0 :M rRNA Reverse Transcriptase (RT) Primer Mix, andadding deionized water (DI H₂O) to a final volume of 30 :L and aconcentration of 0.83 :g/:L of RNA.

The following primers were used to target 16S and 23S RNA (each primeris 5 :M in the RT primer mix): 16S1514 5′-CCTACGGTTACCTTGTT-3′ 16S8895′-TTAACCTTGCGGCCGTACTC-3′ 16S541 5′-TCGATTAACGCTTGCACCC-3′ 23S28785′-CCTCACGGTTCATTAGT-3′ 23SEco2064 5′-CTATAGTAAAGGTTCACGGG-3′ 23SEco15195′-TCGTCATCACGCCTCAGCCT-3′ 23S1012 5′-TCCCACATCGTTTCCCAC-3′ 23S5395′-CCATTATACAAAAGGTAC-3′

The RNA/RT primer mix/DI H₂O mixture was heated to 70° C. for 5 minutesand then transferred to 4° C.

To the above mixture, a reverse transcription mixture including 10 :L of10×MMLV RT Buffer, 5 :L of 100 mM DTT, 2 :L of 25 mM DNTP Mix, 3 :L of24.5 U/:L RNAse Inhibitor (RNAguard Ribonuclease Inhibitor (Porcine),Amersham Pharmacia Biotech, P/N 27-0816-01), 6 :L 50 U/:g MMLV ReverseTranscriptase (Epicentre Technologies, P/N MCR85101) and 44 :L of DI H₂Owas added and the reaction was carried out at 42° C. for 25 minutes andtransferred to 45° C. for an additional 20 minutes. The mixture was thentransferred to 4° C.

The rRNA in the DNA:RNA hybrids was then digested by adding 5 :L of 10U/:L RNAse H (Epicentre Technologies, P/N R0601K) at 37 C for 45minutes. The enzyme was heat deactivated at 65° C. for 5 minutes andthen transferred to 4° C.

The DNA was then removed by adding 2.5 :L of 5 U/ul DNAse I(Amersham-Pharmacia Biotech P/N 27-0514-01) and 1 :L of 24.5 U/:L RNAseinhibitor. Digestion was carried out at 37° C. for 20 minutes and theenzyme was deactivated by adding EDTA to a final concentration of 10 mM.

After the reaction was completed, the product was purified (RNeasy TotalRNA Isolation Kit, QIAGEN P/N 74104). The sample and another sample ofunmodified E. coli total RNA were then labeled using the methodsdescribed below in Example 4 and separately hybridized to E. coli GenomeArray (Affymetrix, Inc., Santa Clara, Calif. P/N 510051). The hybridizedarrays were then washed, stained and scanned using standard methods asdescribed in the E. coli Genome Array User's Manual (Affymetrix, Inc.,Santa Clara, Calif.).

The removal efficiency for 16s and 23s rRNA is typically between 80-90%.FIGS. 6 and 7 shows the results of hybridization of enriched andnon-enriched RNA to microarrays. FIG. 6 shows hybridization of labeledunenriched RNA to a microarray. FIG. 7 shows hybridization of labeledenriched RNA to an identical microarray. As can be seen by comparingFIGS. 6 and 7, the hybridization in FIG. 7 shows a much cleanerhybridization with less signal produced by cross hybridization.

2. mRNA Enrichment by Removal of 16S and 23S rRNA Using Exogenous DNA

Cloned DNAs encoding the E. coli 16S and 23S rRNA genes were amplifiedseparately by PCR and purified with the QIAquick PCR purification kit(QIAGEN P/N 28104). One :g of 16S and 1 :g of 23S rDNA were combined ina PCR tube and diluted to 25 :L with DI H₂O. The DNA was denatured byheating at 99° C. for 5 minutes in a thermocycler. The tube wastransferred to 70° C. followed by the addition of 25 :L of a prewarmed(at 70° C.) solution containing 1 :g E. coli total RNA, 200 mM NaCl, 100mM Tris (pH 7.5). The tube was incubated at 70° C. for 30 minutes topermit annealing of the rRNAs to the corresponding complementary strandof rDNA (approximately 1:1 molar ratio). The tube was then transferredto 37° C. followed by the addition of 50 :L of a prewarned (at 37 C)solution containing 2 units of E. coli RNAseH (Epicentre TechnologiesP/N R0601K), 50 mM Tris (pH 7.5), 100 mM NaCl, 20 mM MgCl₂, and thereaction was incubated at 37° C. for 20 minutes to digest RNA fromDNA:RNA hybrids. DNA was then digested by the addition of 2 units ofDNAse I (Epicentre Technologies, P/N D9902K) and incubation at 37° C.for 15 minutes. EDTA was then added to a final concentration of 20 mM toinhibit further nuclease activity. RNA was purified with an RNeasycolumn (QIAGEN P/N 74104) and then analyzed in a denaturing agarose gelstained with ethidium bromide.

FIG. 8 is a gel image of three samples. Lane 1 is an untreated sample.Lane 2 is an enriched sample where the RNAse A step was not performed.Lane 3 is an enriched sample. Comparison of lanes 1, 2, and 3 indicatesthat the loss of the 16S and 23S rRNA bands in the enrichment procedureresulted from the specificity of RNAse H for DNA:RNA hybrids.

3. mRNA Enrichment by Removal of 16s and 23s rRNA Using DNA BaitRecycling

Cloned DNAs encoding the E. coli 16S and 23S rRNA genes were amplifiedseparately by PCR and purified with the QIAquick PCR purification kit(QIAGEN P/N 28104). 0.6 :g of 16S and 0.6 :g of 23S rDNA were combinedin a PCR tube and diluted to 48 :L with DI H₂O. The DNA was denatured byheating at 99° C. for 5 minutes in a thermocycler. The temperature waslowered to 70° C. followed by the addition of 48 :L of a prewarmed (at70° C.) solution containing 6 :g E. coli total RNA, 200 mM NaCl, 100 mMTris (pH 7.5), and 12 units of thermostable RNAse H (EpicentreTechnologies, P/N H39100). The tube was incubated at 70° C. for 1 minuteto permit annealing of the rRNAs to the corresponding complementarystrand of rDNA (approximately 1 mole DNA per 10 moles RNA). Thetemperature was reduced to 50° C. for 5 minutes to complete one cycle ofenrichment. The temperature was then increased to 70° C. for 1 minutethen again reduced to 50° C. for 5 minutes to complete the second cycle.This temperature cycling was repeated a total of 30 times. After 1, 5,10, 20, and 30 cycles 16 :L (corresponding to 1 :g RNA from the startingmixture) was removed from the tube and mixed with 1 unit DNAse I(Epicentre Technologies, P/N D9902K) and incubated at 37° C. for 15minutes. EDTA was then added to a final concentration of 20 mM toinhibit further nuclease activity. RNA was purified from each samplewith an RNeasy column (QIAGEN P/N 74104) and then analyzed in adenaturing agarose gel, along with 1 :g of untreated E. coli total RNA(FIG. 9). The diminishing amounts of 23S and 16S RNA as cycles arerepeated can be seen by comparing the lanes from left to right. Thefirst lane (labeled U) is untreated. The next lanes are the amount of23S and 16S RNA after 1, 5, 10, 20 and 30 cycles, respectively.

The gel was transferred to a nylon membrane (Northern transfer) and thequantity of a particular mRNA transcript, from the E. coli lpp gene, wasdeduced by hybridization to a digoxigenin-labeled lpp probe (Roche P/N1636090), followed by detection with anti-DIG-alkaline phosphatase andNBT/BCIP (Roche P/N 1175041) (10). It is apparent that the bandscorresponding to the 23S and 16S rRNAs are reduced much more withsuccessive cycles than the band corresponding to the lpp transcript, anindication of specific reduction of rRNA and relative enrichment ofmRNA. The enrichment demonstrates that the input exogenous DNA bait is“recycled,” that is, each complementary rDNA molecule can direct thedestruction of multiple rRNA molecules.

4. mRNA Labeling (Thiol Kinase—Dependent Method)

Fragmentation and labeling reactions were done in PCR tubes in athermocycler. A maximum of 20 μg of RNA was used for the fragmentationstep. To avoid incomplete fragmentation, multiple tubes were used if theyield of RNA from the enrichment step was greater than 20 μg. Thefragmentation reaction mixture comprised 10 μl of 10× NEBuffer for T4Polynucleotide Kinase (New England Biolabs, P/N 201L), up to 20 μg ofRNA and deionized water (DI H₂O) up to 88 μl total volume. The reactionwas incubated at 95° C. for 30 minutes and then cooled to 4° C.

The 5′-thiolation reaction mixture comprised, 88 μl fragmented RNA, 2.0μl 5 mM γ-S-ATP (Roche P/N 1162306) and 10 μl of 10 U/μl T4Polynucleotide Kinase Kinase (New England Biolabs, P/N 201L). Thereaction was incubated at 37° C. for 50 minutes and then inactivated at65° C. for 10 minutes and finally cooled to 4° C.

Excess γ-S-ATP was removed by ethanol precipitation: the samples wereremoved from the PCR tube(s) and combined in a sterile microcentrifugetube. {fraction (1/10)} volume of 3 M sodium acetate, pH 5.2 (SigmaChemical, P/N S 7899) and 2.5 volumes of ethanol were added and left onice for 15 minutes. The tubes were then spun at 14,000 rpm at 4° C. for30 minutes to pellet the RNA. The pellet was then resuspended in 90 μofDI H₂O.

The RNA was then labeled with biotin. 6.0 μl of 500 mM MOPS, pH 7.5(Sigma Chemical P/N M3183) was added to 90 μl of fragmented thiolatedRNA with 4.0 μl of 50 mM Polyethylene Oxide (PEO)-Iodoacetyl-Biotin(Pierce Chemical, P/N 21334ZZ). The reaction was incubated at 37° C. forone hour and then cooled to 4° C. Unincorporated label was removed usingthe QIAGEN RNA/DNA Mini Column Kit (QIAGEN P/N 14123). Optionally, forincreased RNA recovery, one RNA/DNA column and 5.4 mL Buffer QRV2 per10.0 μg RNA was used. Additionally, 50 μg of glycogen (BoehringerMannheim, P/N 901393) per tube was optionally used to act as a carrierand aid in the visualization of the pellet.

The pellet was then dissolved in 20 to 30 μL of Molecular Biology Gradewater.

The enriched mRNA preparation was quantified by 260 nm absorbance.Typical yields for the procedure were 2 to 4 μg of RNA. The labeled RNAwas stored at −20° C. until ready for use.

The efficiency of the labeling was assessed using a gel shift assay. Inthis assay, the addition of biotin residues is monitored by comparingfragments which are pre-incubated with avidin prior to electrophoresiswith fragments where no avidin has been added. Biotin-containingresidues are retarded or shifted “upwards” on the gel during theelectrophoresis due to avidin binding. The nucleic acids are thendetected by staining. An absence of a shift pattern is an indication ofno or poor biotin labeling.

A NeutrAvidin solution of 2 mg/mL or higher was prepared (PierceChemical, P/N 31000ZZ). 50 mM Tris, pH 7.0 (Ambion, P/N 9850G) is usedto dilute the NeutrAvidin solution. A TBE gel (4%-20%) (Invitrogen, P/NEC62252) was placed into a gel holder and load system with 1×TBE Buffer.For each sample tested, two 150 to 200 ng aliquots of fragmented andbiotinylated sample were removed. 5 μl of 2 mg/mL NetrAvidin were addedto each tube tested. The mixture was allowed to sit at room temperaturefor 5 minutes. Loading dye (Amresco, P/N E-274) was added to a 1× dyeconcentration. 10 bp and 100 bp DNA ladders (Gibco BRL P/N 10821-015 and15628-019) were prepared and both samples and ladders were loaded on thegel. The gel was run at 150 volts for approximately 1 hour. While thegel was running, SYBR Green I or Gold (Molecular Probes P/N S-7563 orS-11494) was prepared for staining. After completion of the gel run, thegel was stained for 10 minutes.

After staining, the gel was placed in a UV light box to produce animage. FIG. 11 is a gel image of the labeled E. coli fragments. Lane 1is the 10 bp DNA ladder, lane 2 is fragmented and labeled total E. coliRNA, lane 3 is fragmented and labeled total E. coli RNA with avidin,lane 4 is fragmented and labeled enriched E. coli mRNA, lane 5 isfragmented and labeled enriched E. coli mRNA with avidin and lane 6 is100 bp DNA ladder. Lanes 3 and 5 show a clear upward shift as comparedto lanes 2 and 4 respectively, thus indicating successful biotinlabeling of the RNA fragments.

5. mRNA Labeling (Thiol Kinase—Independent Method)

MRNA enrichment was performed as described Example 1 above. To label theenriched RNA directly with biotin with the thiol kinase (tk)—independentmethod, the following were combined in a final volume of 100 μL: 10 μgof RNA, 30 mM MOPS, pH 7.5, 20 mM iodoacetyl-PEO-biotin (PierceChemicals), 10 mM magnesium chloride. The components were placed in aPCR tube, heated to 95° C. for 30 min, then 25° C. for 30 min and cooledto 4° C. in a PCR instrument as above. Unreactive label was removed fromthe labeled RNA fragments on RNA/DNA mini-columns (Qiagen). The labeledRNA solution was mixed with 5.4 mL of QRV2 buffer (Qiagen) beforeloading on a single column. Labeled RNA fragments were precipitatedafter the addition of 25 μg of carrier glycogen.

To compare the efficiency of labeling, gel shift assays were performedas described in example 4 above. FIG. 12 is the gel image. Lane 1contains a 10 bp DNA ladder, lane 2 contains RNA labeled by thetk-independent method without avidin, lane 3 contains RNA labeled by thetk-independent method with avidin, lane 4 contains RNA labeled by thetk-independent method without avidin, lane 5 contains RNA labeled by thetk-independent method with avidin, lane 6 contains avidin alone as acontrol, lane 7 contains RNA labeled by the tk-dependent method withoutavidin, and lanes 8-13 contain RNA labeled with the tk-dependent methodwith avidin. Lanes 3, 5 and 8-13 all show a clear shift as compared totheir respective controls clearly indicating that the RNA fragments havebeen labeled. Comparison by eye demonstrates that the tk-independentmethod labels with less intensity than the tk-dependent method. A lowerlabeling efficiency may be advantageous in samples for which the signalis very strong and data accuracy is inhibited by saturation of thesignal.

6. Comparison of E. coli Expression Using Both the TK-Dependent andTK-Independent Labeling Methods.

To further compare the two labeling methods, the expression patterns ofRNA from E. coli strains grown in minimal media and enriched media wereanalyzed. Cells were grown in either minimal media or enriched mediaconditions, RNA was isolated from each population, and the RNA was thenlabeled using either the tk-dependent or tk-independent method.Expression data was analyzed by hybridizing the labeled RNA tomicroarrays designed to interrogate E. coli. The microarray data wasthen compared to traditional Northern blot and Slot blot data fromsimilarly treated populations of cells.

E. coli strain MG1655 was obtained from the E. coli Genetic Stock Centerlocated in Yale University. Luria Broth (Teknova) was used for theenriched medium. Cells were grown at 37° C. on a gyrotory shaker set at270-280 rpm. Cells were harvested at mid-log phase (OD 0.8-0.9 at 420nm). Total RNA was isolated using the MasterPure™ RNA Purification Kit(Epicentre).

RNA spike controls were prepared by in vitro transcription of linearizedplasmid templates. After purification, the RNA was quantified by itsabsorbance at 260 nm. Control RNA spikes (2 femtomoles each) were addedto the E. coli RNA prior to labeling.

The RNA was labeled using the tk-dependent and tk-independent methodsdescribed in Examples 4 and 5, respectively. In both cases unreactivelabel was removed from the labeled RNA fragments on RNA/DNA mini-columns(Qiagen). The labeled RNA solution was mixed with 5.4 mL of QRV2 buffer(Qiagen) before loading on a single column. Labeled RNA fragments areprecipitated after the addition of 25 μg of carrier glycogen.

Both samples were then hybridized to E. coli Genome Array (Affymetrix,Inc., Santa Clara, Calif. P/N 510051). The hybridized arrays were thenwashed, stained and scanned using standard methods as described in theE. coli Genome Array User's Manual (Affymetrix, Inc., Santa Clara,Calif.).

Duplicate assays were run for each method. FIG. 13 is an array imagefrom the experiment. Panel A is the array image of the hybridized E.coli RNA labeled with the tk-dependent method. Panel B is an array imageof the hybridized E. coli RNA labeled with the tk-independent method.Signal shows up as a bright spot against a dark background. A comparisonof the two images by eye shows that the tk-independent method showed alower level of signal intensity.

Data was analyzed using the GeneChip® Software from Affymetrix, Inc.Calls, Average Difference values and Fold Changes were calculated withGeneChip® Software through the Expression Analysis Window. Defaultsettings were used for the analysis. The number of sequences calledpresent and the median average difference was calculated for each of thelabeling techniques and the results are show in Table 1, below. TABLE 1Calls in the RNA coding region thiol kinase non thiol kinase methodmethod Exp. A Exp. B Exp. 1 Exp. 2 Total 4216 4216 4216 4216 #'s Present1938 2011 1928 1777 #'s Absent 2188 2130 2242 2378 % Absent 51.9 50.553.2 56.4 Avg Med 2111 1806 926 815 Int

As seen in Table I, row 1 (labeled “Total”) a total of 4,216 probe setsrepresenting open reading frames were analyzed. In simplified terms, ifa hybridization signal above a certain threshold is detected, the probeset is called present. Row 2 (labeled “#'s Present”) shows the number ofprobe sets representing open reading frames on the array that werecalled present. If the hybridization signal is below the threshold, thegene is called absent. Row 3 (labeled “#'s Absent”) shows the number ofgenes called absent. For the purposes of this application, “AverageMedian Intensity” (row 4) is used to quantitate signal intensityreadings across the entire array.

Higher signal intensity is observed for the tk-dependent method (row 4,experiments A and B) than with the tk-independent method (row 4,experiments 1 and 2). Comparison of the results in row 4 shows that thetk-dependent method exhibits about half the intensity as thetk-dependent method. Importantly, the decreased signal intensity doesnot translate into a significant loss in the number of genes calledpresent in the two methods (compare row 2, experiments A and B with row2, experiments 1 and 2). This result indicates that the tk-independentmethod labels at about half the intensity of the tk-dependent method.Under some conditions, lower signal intensity may be desirable toprevent loss of accuracy due to signal saturation.

Correlation graphs were prepared using average difference values for all4,216 probe sets representing open reading frames. For the purposes ofthis application, average difference is used to demonstrate the signalintensity between probe pairs on the same array. Both techniques createreproducible results as seen in the intra-assay correlation graphs(FIGS. 14 and 15).

FIG. 14 shows the average difference correlation comparing the resultsof two different tk-independent experiments to each other. The X axisindicates the average difference results from experiment A and the Yaxis indicates the average difference results from experiment B. Aperfect correlation, i.e. perfect reproducibility between differentexperiments would be indicated by an r² value of 1. The r² value in thiscase is 0.991 indicating a good correlation, or in other words, a highdegree of reproducibility in signal intensity for the tk-dependentmethod.

FIG. 15 shows the average difference correlation comparing the resultsof two different tk-dependent experiments to each other. The X axisindicates the average difference results from experiment 1 and the Yaxis indicates the average difference results from experiment 2. Again,a perfect correlation would be indicated by an r² value of 1. The r²value in this case is 0.9898 indicating a good correlation, or in otherwords, a high degree of reproducibility in signal intensity for thetk-independent method.

The two different methods are correlated as seen in FIG. 16. In FIG. 16,the X axis represents the tk-dependent experiments (average of exp.A+exp. B) and the Y axis represents the tk-independent experiments(average of exp. 1+exp. 2). The slope is 0.5075, again indicating thatthe label in the tk-independent method is about half as intense as thetk-dependent method. Note that the correlation coefficient is 0.951indicating a high degree of correlation between the two techniques. Themajor discrepancies are seen at the high intensity levels where thetk-dependent method may have reached saturation.

CONCLUSION

The presently claimed invention provides greatly improved methods forenriching and labeling nucleic acids. It is to be understood that theabove description is intended to be illustrative and not restrictive.Many variations of the invention will be apparent to those of skill inthe art upon reviewing the above description. By way of example, theinvention has been described primarily with reference to the enrichmentand labeling of mRNA, but it will be readily recognized by those ofskill in the art that the invention may be employed to enrich and labelall types of nucleic acids including other forms of naturally andnon-naturally occurring polynucleotides such as RNAs and DNAs.Furthermore, it will be understood by those of skill in the art that theenriched and/or labeled nucleotides of the presently claimed inventionmay be utilized in a wide variety of biological analyses in no waylimited to those methods disclosed in the present invention. Therefore,it is to be understood that the scope of the invention is not to belimited except as otherwise set forth in the claims.

1. A method of preparing a nucleic acid comprising: increasing therelative percentage of a population of nucleic acids of interest withina mixed population of nucleic acids, wherein said population of interestcomprises a plurality of nucleic acid sequences, comprising: (a)contacting a nucleic acid sample with a bait molecule, wherein said baitmolecule is capable of complexing specifically to a target sequence, butnot to said sequences in said population of interest, under suchconditions as to allow for the formation of a bait:target complex; (b)removing said bait:target complex from said mixed population therebyresulting in an increase in the relative percentage of said populationof interest; fragmenting the sequences from said population of interestto produce fragments; and adding a signal moiety to the fragments. 2.The method of claim 1 wherein the nucleic acid sample is an RNA sample.3. The method of claim 1 wherein the nucleic acid sample is derived froma prokaryotic organism.
 4. The method of claim 1 wherein the nucleicacid sample is derived from a gram negative prokaryotic organism.
 5. Themethod of claim 1 wherein the nucleic acid sample is derived from E.coli.
 6. The method of claim 1 wherein said population of interest ismessenger RNA (mRNA.)
 7. The method of claim 1 wherein said targetsequence is stable RNA.
 8. The method of claim 1 wherein said targetsequence is ribosomal RNA (rRNA).
 9. The method of claim 1 wherein saidtarget sequence is 23S RNA.
 10. The method of claim 1 wherein saidtarget sequence is 16S RNA.
 11. The method of claim 1 wherein said baitmolecule is generated exogenously.
 12. The method of claim 1 whereinsaid bait molecule is chemically synthesized.
 13. The method of claim 1wherein said bait molecule is cloned from single stranded phage DNA. 14.The method of claim 1 wherein said bait molecule is synthesized byreverse transcriptase using said target sequence as a template.
 15. Themethod of claim 1 wherein the nucleic acid sample is an RNA sample, thebait molecule is DNA, and the bait:target complex is a DNA:RNA hybrid.16. The method of claim 14 wherein said bait molecules are synthesizedby reverse transcriptase after the addition of primers comprising atleast one of the following sequences: 5′-CCTACGGTTACCTTGTT-3′5′-TTAACCTTGCGGCCGTACTC-3′ 5′-TCGATTAACGCTTGCACCC-3′5′-CCTCACGGTTCATTAGT-3′ 5′-CCATTATACAAAAGGTAC-3′5′-CTATAGTAAAGGTTCACGGG-3′ 5′-TCGTCATCACGCCTCAGCCT-3′5′-TCCCACATCGTTTCCCAC-3′.


17. The method of claim 1 wherein said bait is attached to a solidsubstrate.
 18. The method of claim 17 wherein said solid substrate is abead.
 19. The method of claim 17 wherein said step of removing saidtarget sequence is accomplished by separating said solid substrate fromsaid mixed population.
 20. The method of claim 1 wherein said bait ismodified to comprise a selectable element.
 21. The method of claim 20wherein said selectable element is selected from the group consistingof: a nucleic acid sequence, a ligand, a receptor, an antibody, ahaptenic group, an antigen, an enzyme or an enzyme inhibitor.
 22. Themethod of claim 20 further comprising the step of exposing saidbait:target complex to a reagent capable of binding said selectableelement to form a reagent:bait:target complex.
 23. The method of claim22 wherein the reagent capable of binding said selectable element isselected from the group consisting of: a nucleic acid sequence, aligand, a receptor, an antibody, a haptenic group, an antigen, an enzymeor an enzyme inhibitor.
 24. The method of claim 20 wherein saidselectable element is a biotin.
 25. The method of claim 22 wherein saidreagent capable of binding said selectable element is streptavadin. 26.The method of claim 22 wherein said step of removing said RNA sequenceis accomplished by separating said reagent:bait:target complex from saidmixed population.
 27. The method of claim 26 wherein thereagent:bait:target complex is attached to a solid support.
 28. Themethod of claim 15 wherein said step of removing said RNA:DNA hybridcomprises exposing said RNA:DNA hybrid to a reagent which specificallyrecognizes RNA:DNA hybrids.
 29. The method of claim 28 wherein saidreagent is RNAse H.
 30. The method of claim 28 wherein said reagent isan antibody.
 31. The method of claim 1 wherein the step of removing saidbait:target complex is a two step process in which the target is removedfirst and the bait molecule is removed thereafter.
 32. The method ofclaim 29 further comprising the step of removing any remaining DNA baitmolecules after said target RNA sequence is removed.
 33. The method ofclaim 32 wherein said step of removing said DNA bait molecule isaccomplished by digestion with DNAse I.
 34. The method of claim 31wherein steps (a) and (b) are repeated.
 35. The method of claim 34wherein the same bait molecule is used to remove multiple targetsequences.
 36. The method of claim 35 wherein a thermostable RNAse H isused to remove said target sequences from said bait:target complex. 37.The method of claim 34 wherein step (a) is performed at a firsttemperature and step (b) is performed at a second temperature.
 38. Themethod of claim 1 wherein said signal moiety is a biotin.
 39. The methodof claim 1 wherein said signal moiety is a PEO-Iodoacetyl Biotin. 40.The method of claim 1 wherein the signal moiety is attached to the 5′ends of said fragments.
 41. The method of claim 40 wherein after saidstep of fragmenting, said 5′ ends of said fragments are chemicallymodified.
 42. The method of claim 41 wherein the 5′ ends of saidfragments are chemically modified by (—S-ATP and T4 kinase.
 43. Themethod of claim 40 wherein said chemical modification results in theaddition of a thiol group to the 5′ end of said fragments.
 44. Themethod of claim 43 wherein said detectable signal moiety isPEO-Iodoacetyl Biotin.
 45. A method of increasing the relativepercentage of a nucleic acid population of interest within a mixedpopulation of nucleic acids, wherein said population of interestcomprises a plurality of nucleic acid sequences, comprising: (a)contacting a nucleic acid sample with a bait molecule, wherein said baitmolecule is capable of hybridizing specifically to a target sequence butnot to said sequences in said population of interest, under suchconditions as to allow for the formation of a bait:target complex; and(b) removing said bait:target complex from said mixed population therebyresulting in an increase in the relative percentage of said nucleic acidpopulation of interest.
 46. The method of claim 45 wherein the nucleicacid sample is an RNA sample.
 47. The method of claim 45 wherein thenucleic acid sample is derived from a prokaryotic organism.
 48. Themethod of claim 45 wherein the nucleic acid sample is derived from agram negative prokaryotic organism.
 49. The method of claim 45 whereinthe nucleic acid sample is derived from E. coli.
 50. A compound havingthe formula: n-S-acetyl-PEO-sig wherein n is a polynucleotide, S isthiol, acetyl is an acetyl functional group, PEO is polyethelene oxide,and sig is a signal moiety.
 51. The compound of claim 50 wherein saidsignal moiety is a biotin.
 52. The compound of claim 50 wherein saidpolynucleotide is a DNA.
 53. The compound of claim 50 wherein saidpolynucleotide is an RNA.
 54. The compound of claim 50 wherein saidpolynucleotide is an mRNA.
 55. The compound of claim 50 wherein saidthiol group is at the 5′ of said polynucleotide.
 56. A method forlabeling a polynucleotide comprising: contacting said polynucleotidewith PEO-iodoacetyl conjugated to a signal moiety under conditions suchthat the PEO-iodoacetyl will attach to said polynucleotide.
 57. Themethod of claim 56 wherein said polynucleotide comprises a thiol group.58. The method of claim 57 wherein said thiol group is at the 5′ of saidpolynucleotide.
 59. The method of claim 58 wherein said signal moiety isa biotin.
 60. The method of claim 56 wherein said polynucleotide is aDNA.
 61. The method of claim 56 wherein said polynucleotide is an RNA.62. The method of claim 56 wherein said polynucleotide is an mRNA.
 63. Amethod for labeling a polynucleotide comprising: contacting saidpolynucleotide with a reactive thiol group to form a thiolatedpolynucleotide; contacting said thiolated polynucleotide with a signalmoiety capable of reacting with said thiolated polynucleotide underappropriate conditions such that said signal moiety is attached to saidpolynucleotide.
 64. The method of claim 63 wherein said step of creatinga thiol group comprises contacting said polynucleotide with a gamma SATP and a kinase.
 65. The method of claim 63 wherein said signal moietyis a biotin.
 66. The method of claim 63 wherein said polynucleotide is aDNA.
 67. The method of claim 63 wherein said polynucleotide is an RNA.68. The method of claim 63 wherein said polynucleotide is an mRNA.
 69. Amethod of labeling prokaryotic mRNA comprising: obtaining a populationof RNA comprising both stable RNA and mRNA from a prokaryotic organism;increasing the relative percentage of mRNA in said population of RNAcomprising the steps of; exposing said population of RNA to a pluralityof DNA bait molecules which are complementary to at least a portion ofthe stable RNA in said population of RNA under such conditions as toallow for the formation of DNA:RNA hybrids; exposing said DNA:RNAhybrids to RNAse H to remove the RNA from said RNA:DNA hybrids,producing a sample comprising of DNA and mRNA; and exposing said samplecomprising of DNA and mRNA to DNAse thus increasing the relativepercentage of mRNA within said population of mRNA; fragmenting said mRNAto form mRNA fragments; exposing said mRNA fragments to γ-S-ATP and T4kinase to produce reactive thiol groups at the 5′ ends of said mRNAfragments, thereby forming thiolated mRNA fragments; and exposing saidthiolated mRNA fragments to PEO-Iodoacetyl-Biotin such that a stablethio-ether bond is formed between said thiolated mRNA fragments and saidPEO-Iodoacetyl-Biotin.